7.22.1 [10] <7.11>
In  future  systems,  we  expect  to  see  heterogeneous  computing  platforms  constructed out of heterogeneous CPUs. We have begun to see some appear in the  embedded processing market in systems that contain both floating point DSPs and  a microcontroller CPUs in a multichip module package. Assume that you have three classes of CPU: CPU A—A moderate speed multi­core CPU (with a floating point unit) that can  execute multiple instructions per cycle. CPU B—A fast single­core integer CPU (i.e., no floating point unit) that can execute a single instruction per cycle. CPU C—A slow vector CPU (with floating point capability) that can execute multiple copies of the same instruction per cycle. Assume that our processors run at the following frequencies:

CPU A can execute 2 instructions per cycle, CPU B can execute 1 instruction per  cycle, and CPU C can execute 8 instructions (though the same instruction) per  cycle. Assume all operations can complete execution in a single cycle of latency  without any hazards.

All three CPUs have the ability to perform integer arithmetic, though CPU B cannot perform floating point arithmetic. CPU A and B have an instruction set similar  to a MIPS processor. CPU C can only perform floating point add and subtract  operations, as well as memory loads and stores. Assume all CPUs have access to  shared memory and that synchronization has zero cost. The task at hand is to compare two matrices X and Y that each contain 1024 × 1024  floating point elements. The output should be a count of the number indices where  the value in X was larger or equal to the value in Y. 
 Describe how you would partition the problem on the 3 different CPUs to obtain the best performance.
 
 
View Solution
 
 
 
<< Back Next >>